Imitation with incomplete information in 2x2 games 
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Evolutionary game theory has been an important tool for describing economic and social be- 
haviour for decades. Approximate mean value equations describing the time evolution of strategy 
concentrations can be derived from the players' microscopic update rules. We show that they can 
be generalized to a learning process. As an example, we compare a restricted imitation process, 
in which unused parts of the role model's meta-strategy are hidden from the imitator, with the 
widely used imitation rule that allows the imitator to adopt the entire meta-strategy of the role 
model. This change in imitation behaviour greatly affects dynamics and stationary states in the 
iterated prisoner dilemma. Particularly we find Grim Trigger to be a more successful strategy than 
Tit-For-Tat especially in the weak selection regime. 
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I. INTRODUCTION 

In evolutionary game theory [1-4] the success of selfish 
individuals (players) is determined by their interaction 
with other players. The most extensively studied evo- 
lutionary two player game is the iterated (or repeated) 
Prisoner Dilemma (IPD) [5]. The rules of the Prisoner 
Dilemma game are brief and simple yet the parallels that 
can be drawn to human behaviour are manifold. The 
evolutionary aspect of the IPD is commonly introduced 
by imitative behaviour of the players or a reproduction 
process. Here we investigate a restriction on the play- 
ers' ability to imitate, in which imitators can only use 
information obtained by interacting with the role model. 
We incorporate this adjustment in the approximate mean 
value equation [6] that describes the time evolution of 
strategy concentrations. Particularly we consider the 
simplest case of a one step memory where players remem- 
ber what happened in the last encounter of the game. We 
show that while populations eventually reach a cooper- 
ative equilibrium, GT is the dominant strategy and the 
only surviving strategy in the weak-selection regime with 
partial imitation. 

The Prisoner Dilemma and other symmetric 2x2 
games (such as the Chicken or Snowdrift game and the 
Stag-Hunt game) are defined by the following set of rules. 
When two players play a game each of them chooses to co- 
operate iC) or defect (D) . Based on these two choices the 
two players are attributed a payoff. A cooperating player 
scores R (S) if his opponent cooperates (defects) and a 
defecting player scores T (P) if his opponent cooperates 
(defects). Usually R is called the Reward for coopera- 
tion, S the Sucker's payoff, T the Temptation to defect 
and P the Punishment for mutual defection. The Pris- 
oner Dilemma game imposes the following restrictions on 
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the payoff parameters; T > R > P > S and 2R > T + P 
to prevent collusion. In the restrictions on the payoff 
parameters roots the tragedy of the Prisoner Dilemma. 
The strategy with the highest expectation for a player 
is D, while the strategy that yields the highest payoff 
for the population is C. The PD is a thus non-zero sum 
game as one player's loss does not equal his opponent's 
gain. As defection dominates cooperation, defection is 
the rational strategy to choose for any player and mu- 
tual defection is the only Nash equilibrium. Cooperation 
is however a widespread phenomenon in nature and the 
study of the emergence of cooperation in a population of 
selfish individuals has been one of the key objectives in 
evolutionary game theory over the past few decades. 

Nowak and his collaborators have contributed enor- 
mously to this topic [4, 7-13] and particularly they 
found five rules of cooperation [14]. Among them, di- 
rect [12, 14, 15] and indirect reciprocity [14, 16] may lead 
to cooperative behaviour. In complex networks, coopera- 
tors can support each other in more than one dimension, 
as for example in [9, 10, 12, 13, 17-19]. Qin et al. observe 
the emergence of cooperation among players with the 
ability to recall their payoff over several generations [20]. 
If players may recall what happened in the last game 
they may employ the famous Tit-For-Tat (TFT), Pavlov 
and Grim Trigger (GT) strategies. More sophisticated 
players that make decisions based on their own and their 
opponent's moves in recent games are used in [21-23]. 
We follow a similar approach in this paper. Our play- 
ers decide how to play based only on the outcome of the 
most recent game. 

In evolutionary game theory it is common to define 
a measure for fitness that is a monotonously increasing 
function of the payoff and to assume that higher fitness 
leads to higher reproduction rates. Approximate mean 
value or replicator equations may be used to predict the 
fate of different strategies for large, well mixed popula- 
tions. Players with tendency to imitate promising strate- 
gies are commonly used at the microscopic level. However 
if strategies more complex than simply cooperate or de- 



feet - so called meta- strategies - are used, imitating may 
turn out to be tricky. Imagine the situation of a novice 
chess player trying to learn from more experienced partic- 
ipants in a chess tournament. The novice may gradually 
improve his game by incorporating the behaviour of bet- 
ter participants in his own strategy, but he will not be 
able to extract and learn the complete strategy of other 
players immediately. In the same spirit, the partial imi- 
tation learning process involves imitation of the exposed 
part of the role model's strategy. A detailed example is il- 
lustrated in section II B. In the context of the PD game, 
we introduce the approximate mean value equation for 
the partial Imitation Rule (pIR) and compare it with the 
the traditional Imitation Rule (tIR), where the complete 
strategy can be copied by the learner. 

The rest of the present paper is structured as follows: 
in section II we describe our methods, results are pre- 
sented in section III and discussed in section IV. We draw 
our conclusion in section V. 



II. METHODS 

In section II A we give a precise description of memory 
and define the possible one-step strategies that appear 
in this paper. The partial Imitation Rule (pIR) is de- 
scribed in detail in section II B and the arising equations 
describing macroscopic dynamics are discussed in II C. 
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B. partial Imitation Rule (pIR) 

As mentioned earlier pIR should be a reasonable re- 
striction of the agents' abilities to imitate. We ex- 
plain in more detail using a concrete example. Con- 
sider Ahce using strategy D|DDDD = S^\S^S^S§S^ 
playing against Bob, who is himself a TFT (C|DCDC 
= S^\SpSi^SgSf^ ) strategist. The transition graph for 
this encounter is shown in figure 1. From Alice's point 

CI (P,P) (T,S) 
DD < DC 



A. Memory 

The ensemble of possible strategies for players with n 
steps memory is denoted as M„. As mentioned previously 
we allow our agents to play moves based on their own 
previous move and the move of their opponent. Therefore 
we need an encoding scheme for Mi strategies. As every 
player has two choices for each move {D or C) there are 4 
possible outcomes {DD, DC, CD and CC) with payoffs 
P, T, S and R every time the game is played. Thus we 
need 4 responses Sp, St, Ss and Sji for the DD, DC, 
CD and CC histories of the last game respectively. The 
agents also need to know how to start playing if there is 
no history. We add an additional first move Sq. Adding 
up to a total of 5 moves for a one-step memory strategy. 
A strategy in Mi is thus denoted as So\SpStSsSr where 
So is the first move and Sp, St, Ss and Sp are the moves 
that follow DD, DC, CD and CC histories respectively. 
Thus there are \Mi\ = 2^ = 32 possible strategies as 
there are two choices for each Si, either C or D. In 
table I this scheme is illustrated along with three famous 
strategies Grim- Trigger, Tit-For-Tat and Pavlov and the 
groups of 4 strategies that always defect (cooperate) in 
practice. The aforementioned encoding scheme is easily 
generalized to M„. A treatment of players with two-step, 
three-step and even longer memory can be found in [21- 
23]. Note that the total number of possible strategies 
\Mn\ increases exponentially with n. 
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FIG. 1. Transition graph between the D|DDDD strategist 
Alice and the TFT (C|DCDC) strategist Bob. In parentheses 
the payoff of Alice, Bob from the corresponding state. The 
payoff from the recurrent state is P for both strategies. 

of view the first outcome is SqSq = DC, hence she 
uses S^ = D for her next move. Similarly Bob uses 
Sg — D. The outcome of the second game is thus 
StSs = DD. Both players then use their Sp move D for 
the third game. All subsequent outcomes are therefore 
SpSp = DD and Alice's and Bob's recurrent state pay- 
off, i.e. the average payoff per game in the limit where in- 
finitely many games are played between the two, is there- 
fore P. In summary, Alice has plays Sq S^ Sp Sp ... and 
Bob plays S(^ S§ S^ S^ .... In the framework of pIR Al- 
ice now imitates Bob. As she has only witnessed him 
play his So, Sp and Ss move she will only adopt these 
moves from Bob's strategy. Her strategy will be changed 
in the imitation process as 

cyl I Cj4 cA cA , cB\cBcAcBcA 

Og \OpOj^Os Op — > Og \OpOrpOs Op , or 
D|DDDD C|DDDD 

where i j means that strategy i turns into strategy j 
by imitating strategy k. The next round Alice will thus 
be playing CjDDDD. We see that the result of such an 
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imitation process may differ from a complete imitation of 
the role model's strategy (in this case Alice would simply 
become a TFT player) even in the simple case of players 
with one-step memory. To avoid confusion we refer to the 
rule that allows the imitator to copy the entire strategy 
of the role model traditional Imitation Rule (tIR). The 
example illustrates how pIR strips the imitators from a 
supernatural ability to "mind read" the role models and 
thereby extract hidden parts of their strategy. Note that 
in the case where players do not have memory, i.e. are 
simply cooperators or defectors, there is no need for a 
distinction between tIR and pIR. If agents with different 
memory lengths are interacting we need to specify how an 
agent Alice with a n-step memory imitates an agent Bob 
with a longer m-step memory if the sequence of Bob's 
moves witnessed by Alice cannot be mapped on an n- 
step memory strategy. As only one-step memory players 
are used here we do not address the issue this paper. 

In order for our players to effectively make use of their 
memory and strategies, players need to play repeatedly 
against their opponents. We denote the number of games 
played per encounter of two players as /. This number 
affects the performance of different strategies and fate 
of the population in a complex way and this discussion 
is beyond the scope of this paper. If we assume that 
two players always play many games against each other 
before they switch to another opponent, then the play- 
ers will mostly find themselves in recurrent states of the 
transition graph. In the limit where / — ^ oo the aver- 
age payoff per game played is the average payoff scored 
in the recurrent states of the transition graphs. Thus in 
our large, well mixed population the payoff of a player 
playing strategy i is well approximated by 
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where Pj is the (number) density, concentration or frac- 
tion of j-strategists in the population and Uij is the av- 
erage recurrent state payoff obtained by an i-strategist 
playing against a j-strategist. We fix our PD payoff pa- 
rameters to a set of commonly used values that offer high 
temptation to defecting players: T = 5, R — 3, P ~ 
1, S = 0. The same set was also used in Axelrod's fa- 
mous computer tournaments [2]. 



g{AU), where AU = Uj — Ui is the payoff difference be- 
tween player i and j. This translates into the fact that 
the more successful players are, the more likely they are 
to be imitated. With our choice of smoothing function g 
the imitation probability is given by 



P{i imitates j) = g[AU) = 



1 



l+exp(=r) 



l + exp(-^) 



(2) 



where X > is a temperature-like noise factor controlling 
the extent of irrationality among the players [6]. 

By using this combination of pIR and smoothing func- 
tion g defined above associated to the payoff of the play- 
ers we basically make assumptions about the availability 
and reliability of information. The details about encoun- 
ters between two players Alice and Bob (i.e. the exact 
moves played during the encounter) are known to Al- 
ice and Bob only. We also assume that Alice and Bob 
do not make any mistakes when memorizing moves and 
when using their strategy to play. However the informa- 
tion about the total payoff of players is available globally 
to all players but associated with an uncertainty whose 
extent is controlled by the noise factor K. In this way 
the players' first hand information is reliable but severely 
limited if the imitation process is based on pIR and in- 
formation about the wealth of the players is available 
globally but this information is not perfectly reliable. 

It is possible to determine macroscopic dynamics from 
the microscopic update rules for both tIR and pIR [6] (see 
appendix A) . The approximate mean value equations for 
tIR is 



dp. 



,tIR 



dt 



P^Y.P=y^U^-U^)-9{Uj-U{)] (3) 



where all the sums are carried out over all considered 
strategies. If as in the case of pIR the imitator may 
adopt a strategy that is different from the role models 
strategy the approximate mean value equation is 

^Pj^PkgiUj - Uk)p{k,j,i) 
j k 

<?([/, -[/.). (4) 



dp! 



dt 



C. Macroscopic Dynamics 

At the macroscopic level we are interested in players 
that do not necessarily make perfect decisions when im- 
itating other players. To account for these irrational de- 
cisions we assign a certain probability to every possible 
imitation process depending on the payoff difference be- 
tween the imitator and the role model. If player i has 
been determined as a possible imitator of player j the 
imitation (with tIR or pIR) occurs with a probability 
given by a monotonically increasing smoothing function 



where 



p: M X M X M ^ [0, 1] 



(5) 



is a function whose value p{k,j, i) is the probability that 
fc-strategist will become an i-strategist by imitating a 
j-strategist and M is the space of all allowed strategies. 
Because the player must use a strategy after the imitation 
process we have the restriction 
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For our model of partial imitation, this probability is 
reduced to a simple form 

PpiR : M X M X M ^ {0,1} (7) 

1 if fc-strategist imitating 
j-strategist becomes 
i-strategist 
otherwise 



(8) 



The approximate mean value equation for pIR is thus 



dp: 



pIR 



dt 



j k 

-PrJ^P^diUj - Ui) . 
3 



III. RESULTS 



(9) 



The dynamics of the two imitation rules are compared 
in section III A and the equilibrium strategy fractions 
presented in section IIIB. 
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FIG. 2. Strategy concentrations p as function of time t for 
different noise factors K. Results from numerical integration 
of the approximate mean value equations for tIR (3) and pIR 
(9) with initial condition pi{t = 0) = for i — 1,2..., n 

where |Mi| = 32 is the number of strategies in Mi. 
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A. Dynamics 

We would like to observe the dynamics of the two im- 
itation rules. Upon examining the smoothing function 
(2) we distinguish three different cases. In the low noise 
limit we have rational players as lim g{AU) — (AC/) 

where O (•) is the heaviside step function. In high noise 
limit we have random drift^ because lim g{AU) = |. 



In figure 2 the concentration of a selection of important 
strategies is given in three cases of low, medium and high 
noise. We refer to the cumulative concentration of al- 
ways defecting strategies (see table I) as Paii-D- When 
tIR is used Paii-D follows a similar evolution for all three 
noise factors. The always defecting strategies die out af- 
ter their fraction increases initially to about 0.4. With 
more noise this process takes more time. If pIR is used 
we observe a similar evolution for low and medium noise. 
As the noise increases, however, we find that the popu- 
lation is temporarily dominated by the always defecting 
strategies before they die out eventually. By observing 
the fraction of GT players we notice a simultaneous ex- 
tinction of all-D strategies and a rise of the fraction of 
GT players followed by an equilibrium state, dominated 
by the GT strategy. 
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FIG. 3. Selection of non-zero equilibrium fractions as function 
of noise factor K. Results from numerical integration of the 
approximate mean value equations for tIR, equation 3 and 
pIR, equation 9 with initial condition pi{t = 0) = iir[\ ^'-'^ 
i = 1,2..., \Mi\ where |A/i| = 32 is the number of strategies 
in Ml. 



B. Equilibrium strategy distribution 



^ Note that if pIR is used then this random drift is possible only 
between certain strategies. 



The results from numerical integration of the approxi- 
mate mean value equations equation 3 and 9 suggest that 
if initially all the strategies are present in equal fractions, 
the system is likely to reach a stationary equilibrium. We 
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can see in figure 2 that tfie equilibrium fraction of GT 
varies with the noise factor K. Figure 3 shows the equi- 
Ubrium fractions of all nice and retaliating strategies and 
the C|CCCC strategy as function of noise factor K for 
both imitation rules^ . We denote the equilibrium fraction 
of strategy X as p^- If only refer to the equilibrium 
fraction under traditional (partial) imitation we add a 

tIR (pIR) superscript: p*^^^ (Px^^)- 

GT is the most abundant strategy at equilibrium for 
both imitation rules and over the whole reasonable range 
of the noise factor K. For traditional imitation we no- 
tice that the lower the noise factor K , the higher the 
equilibrium fraction of GT p*Q^ and for moderate to high 
noise factor, the equilibrium fractions are independent 
of the noise factor. These two observations are reversed 
for partial imitation, i.e. for pIR, the higher the noise 
factor K the higher p*Q^ and at low temperatures the 
equilibrium fractions are independent of K. We notice 
further that for traditional (partial) imitation GT is the 
only dominating strategy and p*Qr^ is very close to 1 if the 
noise factor is small (high). With traditional imitation 
the equilibrium fractions rank independent of the noise 

'rarinr nQ n*'!^ ^ „*tIR ^ *tIR ^ *tIR *tIR 
IdtLOl as ^Qrp ;> p^YT ^ P'Pavlov ^ ^'c|CCDC ^ ^'c|CCCC- 

For partial imitation this is no longer true. Pavlov is now 
more abundant than TFT at equilibrium. 
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FIG. 4. Flow chart of important strategies under pIR. An 

c 

arrow A-^~4b is drawn if an A-strategist adapts strategy B 
when imitating a C-strategist (or a D-strategist). For simplic- 
ity we write A-S^B as A— >-B. If A is nice and retaliating strat- 
egy and B is one of the two transitional strategies (C|DDDD 
or C|CDDD) we also have A— >B. These arrows are omitted 
to avoid an overly crowded figure. The term all-D is used to 
denote all four always defecting strategies. 



IV. DISCUSSION 

With pIR there are |Mip — 1024 possible imitation 
processes, but not all of these are important for the evo- 
lution of the population. In an early phase the naive 
strategies die out and the always defecting strategies be- 
come more popular. After this initial phase of evolution 
the nice and retaliating strategies take over. The most 
important transitions are shown in figure 4. We can see 
that some of the always defecting strategies may directly 
be turned into GT or Pavlov by imitating one of the 
nice retaliating strategies, however none of the always 
defecting strategies can be turned into TFT or C|CCDC 
by imitating one of the nice and retaliating strategies. 
In general, it is more difficult for any always defecting 
strategies to be turned into TFT or CjCCDC than to be 
turned into Pavlov or GT. While this observation can- 
not explain the fate of all different strategies, it serves 
to explain - together with the initial rise of always de- 
fecting strategies - the lower equilibrium densities of the 
C|CCDC and TFT strategy when pIR is used. From this 
observation we may also conjecture that performance or 
fitness do not assure survival if the strategy cannot be 
easily learned by an important group of strategies. Thus 
the "learnability" of a strategy becomes an important 



^ With pIR and low noise a few other strategies have small 
non-zero equilibrium densities, namely C|DDCC, C|DCCC and 
C|CDCC. With tIR these three other always cooperating strate- 
gies have the same equilibrium fractions as the C|CCCC strategy. 



factor that may have strong influence on the competi- 
tiveness. The Pavlov strategy is not successful in the 
high noise setting because it scores considerably lower 
than GT or TFT in a population with a large fraction of 
always defecting strategies. 



V. CONCLUSION 

We have shown that the partial Imiation Rule, an alter- 
native to the common imitative behaviour for two player 
games, can be described by an approximative mean value 
equation. Dynamics and stationary properties are in 
general subject to many parameters such as payoff pa- 
rameters, noise and initial conditions. Our investigation 
is by no means exhaustive but the approximate mean 
value equation predicts that the evolution of well mixed 
populations depends heavily on the employed imitation 
rule. It is therefore important to discuss the imitation 
behaviour whenever meta-strategies, that are more com- 
plex than simply cooperate and defect, are being used. 

The idea of using tIR or pIR (or any other imitation 
rule) is a question of the model one is trying to inves- 
tigate. If we assume that offsprings are created from 
generation to generation it is meaningful to assume that 
they will be using the same strategy as the parent. This 
scenario is equivalent to using tIR. If on the other hands 
the players are surviving over several generations and are 
using imitation then they should adapt their strategies 
via a learning process (as for example pIR) rather than 
complete imitation or tIR. 
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The discussion about the imitative behaviour can also 
be taken to the spatial variant of the prisoner dilemma 
game, especially because in the spatial variant with sta- 
tionary topology it is more natural to use imitation rather 
than reproduction. In the spatial variant one can argue 
that the imitator should, at least to some extent, have 
access to information from the games played by the role 
model with other players. We leave these topics to future 
work. 
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Appendix A: Macroscopic dynamics 

In this section we determine the macroscopic dynam- 
ics of pIR from the microscopic interactions between the 
players. Our approach is mainly based on [6]. For sim- 
plicity we consider only players on a fully connected net- 
work (every player interacts with every other player and 
himself). The transition rate of strategy i to another 
strategy j is given by 

w{i ^ ]) = p,g{U, - U.,} , (Al) 

where pj is the density of j-strategists, the smoothing 
function has been given previously and it is understood 
that the payoffs depend on the densities pi, p2, P\Mn\, 
where \Mn\ is the number of possible strategies. The 
approximate mean value equation for a strategy i in the 
general case is 

^ = E ^pMj -^i)- P^^i^ ^ j)] (A2) 

Inserting equation (Al) into equation (A2) yields a 
non-linear differential equation for the time derivative of 
every strategy"^. 

However in the case of pIR this does in general not 
yield useful results as it does not take any account of the 
fact that an i-strategist who imitates a j-strategist will 
in general not become a j-strategist himself. In order 
for us to take account of this we need to find a correct 
expression for the term w{i — > j) in equation (A2). We 
define the following mapping 

p: M X M X M [0, 1] . (A3) 



^ Note that if proportional imitation is used rather than smoothed 
imitation this procedure simply yields the replicator equation. 



The value p{k,j,i) is the probability that a fc-strategist 
becomes an i-strategist by imitating a j-strategist. The 
transtition rate for fc-strategists becoming z-strategists 
by imitating j strategists is 

wik 4 z) = PjgiUj - Uk)p{k,j, i) . (A4) 

The the total transition rate for fc-strategists migrating 
to strategy i is thus 

w{k i) = ^Pjg{Uj - Uk)p{k,j,i), (A5) 
j 

and the total fraction that migrates to strategy i is 
/+H ^^Pk^Pj9{Uj - Uk)p{k,j,i) 

k^i j 

= J2p^Y. Pk9{Uj - Uk)p{k, J, i) (A6) 

In contrast to the case of tIR a second summation over 
appears here. Under pIR children strategies can be differ- 
ent from both of the parent strategies. Therefore we need 
to consider all the strategies (except strategy i) as imita- 
tor when considering the migration to strategy i when the 
role model uses strategy j. This leads immediately to in- 
teresting phenomena as for example the possible rebirth 
of extinct strategies. Next we define a transition rate for 
i-strategists migrating to another strategy by imitating 
strategy j: 

w{i U) = p,g{Uj - C/,)[l -p(z,j,z)] (A7) 

where the term in brackets takes care of the important 
case where the j-strategist learns nothing new by imitat- 
ing a j-strategist and therefore keeps his previous strat- 
egy. The fraction migrating away from strategy i is thus 

/_(z ^) = p,Y,p,g{U, - C/,)[l -p(z,j,i)] (A8) 

3 

The approximate mean value equation then becomes 

5= U{-^^)-^-{^^) 

= ^Pj^PkgiUj - Uk)p{k,j,i) 

j k^i 

The restrictions on the summation over k and the factor 
in brackets are actually not necessary. We rewrite 

^ = J^PiJ^P'^si^i ^ Uk)p{k,j,i) 
j k 

- X! P^PJdiUj ~ Ui)p{i,j, i) 
j 

-^P^Pj9{U3 - Ui) 

j 

+ P'^PjdiUj - U,)p{i,], i) (AlO) 

3 
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and see that two terms on the second and forth hne can- 
cel. Finally we obtain the general approximate mean 
value equation: 



dfH 
dt 



3 k j 

By specifying the values of p we choose the imitation rule. 
For tIR we simply have 



where 5 is the Kronecker delta and the equation reduces 
to the approximate mean value equation in [6]. For pIR 
we have 



Ppm{k,j,i) 



1 if fc-strategist imitating 
^'-strategist with pIR 
becomes i-strategist 

otherwise 



.(A13) 



(A12) 



[1] J. M. Smith, Evolution and the Theory of Games (Cam- 
bridge University Press, 1982). 

[2] R. Axelrod, The Evolution of Cooperation (Basic Books, 
1984). 

[3] J. Hofbauer and K. Sigmund, Evolutionary Games 
and Population Dynamics (Cambridge University Press, 
1998). 

[4] M. A. Nowak, Evolutionary Dynamics: Exploring the 
Equations of Life (Belknap Press of Harvard University 
Press, 2006). 

[5] W. Poundstone, Prisoner's Dilemma: John Von Neu- 
mann, Game Theory and the Puzzle of the Bomb (Dou- 
bleday New York, NY, USA, 1992). 

[6] G. Szabo and G. Path, Phys. Rep. 446, 97 (July 2007). 

[7] M. A. Nowak and R. M. May Nature 359, 826 (Oct 
1992). 

[8] M. A. Nowak, S. Bonlioefler, and R. M. May, Proc. Nat. 

Acad. Sci. U. S. A. 91, 4877 (May 1994). 
[9] H. Ohtsuki, C. Hauert, E. Lieberman, and M. A. Nowak, 
Nature 441, 502 (May 2006). 
[10] M. A. Nowak and R. M. May, Int. J. Bifurcation and 

Chaos 3, 35 (1993). 
[11] M. A. Nowak and K. Sigmund, Science 303, 793 (Febru- 
ary 2004). 

[12] J. M. Pacheco, A. Traulsen, H. Ohtsuki, and M. A. 
Nowak, J. Tlieor. Biol. 250, 723 (February 2008). 



[13] H. Ohtsuki and M. Nowak, J Theor. Biol. 251, 698 (April 
2008). 

[14] M. A. Nowak, Science 314, 1560 (December 2006). 

[15] M. A. Nowak, A. Sasaki, C. Taylor, and D. Fudenberg, 
Nature 428, 646 (April 2004). 

[16] M. A. Nowak and K. Sigmund, Nature 437, 1291 (Octo- 
ber 2005). 

[17] M. G. Zimmermann and V. M. Egm'luz, Phys. Rev. E 72, 

056118 (Nov 2005). 
[18] Z.-X. Wu and Y.-H. Wang, Phys. Rev. E 75, 041114 (Apr 

2007) . 

[19] G. Szabo, J. Vukov, and A. Szolnoki, Phys. Rev. E 72, 

047107 (Oct 2005). 
[20] S.-M. Qin, Y. Chen, X.-Y. Zhao, and J. Shi, Phys. Rev. 

E 78, 041129 (Oct 2008). 
[21] S. K. Baek and B. J. Kim, Phys. Rev. E 78, 011125 (Jul 

2008) . 

[22] Bukhari, Asian J. Inf Technol. 8, 866 (2006), 

http : //www. medwell journals . com/abstract/?doi= 

aj it. 2006. 866. 871. 
[23] R. Brunauer, A. Locker, H. A. Mayer, G. Mitterlechner, 

and H. Payer, in Proc. 2007 ACM Symposium on Appl. 

Computing, SAC '07 (ACM, New York, NY, USA, 2007) 

pp. 720-727. 



7 



